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(54) Method and apparatus for adaption of models in e.g. speaker verification systems 

(57) The invention relates to a method and an ar- 
rangement for adapting models in speaker verification 

systems or similar systems using models based on data speech 
collected Irom a person during a certain time period. If 
a simple model is utilized a less reliable verification is 
obtained, but if, on the other hand, a more complex mod- 
el is utilized, the problem is a long training time period. 
The present invention solves this problem by using a 
plurality of different models in the same speaker verifi- 
cation system. The verification is put into operation us- 
ing the model requiring a smaller amount of speech da- 
ta. During the use, more speech data is collected con- 
tinuously. This material is used tofurthertrain either only 
the more complex model or both the simpler model al- 
ready put into operation and the more complex model. 
At suitable intervals, a comparison is made of the per- 
formance capacities of the models. Once the more com- 
plex model yields a more reliable verification result it will 
take over the verification in the operating situation. The 
subsequent model unit may be put into operation either 
instantaneously or gradually e.g. by using a weight 
function. 
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Description 

FIELD OF THE INVENTION 

The present invention relates to a method and an £ 
arrangement to adapt models in speaker verification 
systems or similar systems using models based on data 
collected from a person during a cerlain time period. The 
collected data may be related to the physiology, behav- 
ior, aging of the person etc. A related area Is e.g. speak- 
er adaptive speech recognition. In systems of this type, 
collected data is compared to a model for the verification 
of the identity of the speaker or recognition of the speech 
in order to control a course of events in a process or of 
a device. For the model to be able to perform its task it 
has to be trained with speech data. Simpler models re- 
quire less training but provide a less reliable result, while 
more complex models require longer training and pro- 
vide a more reliable result of the verification. 

The invention may be applied in all speaker verifi- 
cation systems that are to be used at a plurality of oc- 
casions, that is speech of the same person is to be ver- 
ified at repeated occasions. As is known, speaker veri- 
fication systems are utilized in order to protect informa- 
tion or economic values. The invention is an alternative 
to the approach of using PIN codes in order to identify 
a user. The voice recording as such may be effected 
either directly at the equipment where the verification is 
performed or is transmitted by various media. The me- 
dium may be telephony or other telecommunication me- 
dia. 

STATE OF THE ART 

In the prior art speaker verification systems use has 
been made of only one model with the special problems 
associated with the model. Thus, if a simple model has 
been used, a less reliable verification has been ob- 
tained. On the other hand, if a more complex model is 
used, the problem is a long term training period. 

The present invention solves this problem by utiliz- 
ing a plurality of different models in the same speaker 
verification system. The verification is put into operation 
with the model requiring the less amount of speech data. 
During the use, more speech data is continuously col- 
lected. This material is used to further train either only 
the more complex model or both the simpler model al- 
ready in operation and the more complex model. At suit- 
able points of time, comparisons are made of the per- 
formances of the models. When the more complex mod- 
el provides a more reliable verification result, it will take 
over the verification in the operation situation. 

It is recognized that because of the invention a 
speaker verification system is obtained that is readily put 
into operation but eventually providing increasingly reli- 
able verification results. The invention enables a use of 
the advantages of different models at the same time as 
the effect of their respective disadvantages is mini- 



mized. Without using this technology, one has to choose 
a model with its associated advantages and disadvan- 
tages at the start of a speeker verification system. By 
shitting models it is achieved that the system dynami- 
cally adapts to the available amount of speech data. 
This means a great advantage over the prior art. 

SUMMARY OF THE INVENTION 

Thus, the present invention provides a method for 
adapting a model in e.g. speaker verification, compris- 
ing model units for receiving and evaluating speech. Ac- 
cording to the invention, speech data is collected and a 
first model unit is put into operation while a subsequent 
model unit is trained with speech data being collected 
during the operation of the first model unit. The perform- 
ances of the model units are tested and evaluated and 
a subsequent model unit is put into operation when the 
performance thereof has reached a suitable level. 

The subsequent model unit may be put into opera- 
tion either instantaneously or gradually, e.g. by using a 
weight function. 

The invention also relates to an arrangement for 
performing the method. 

The invention is defined in detail in the accompany- 
ing claims. 

BRIEF DESCRIPTION OF THE DRAWING 

The invention will be described in detail below with 
reference tothe attached drawing, wherein the only Fig- 
ure is a schematic illustration of an embodiment of the 
invention. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT 

In speaker verification systems, systems for the au- 
tomatic verification of the identity of a speaker, the 
amount of speech data that has to be collected from the 
user is a determining limitation of the use. Complex 
speaker models requiring a large amount of collected 
speech data may be expected to provide a better result 
than models requiring a small amount of training mate- 
rial. However, for a small amount of training material the 
more complex model may yield an inferior result than 
the simpler model. 

Complex models having many parameters have 
better performance than simpler models once the pa- 
rameters of the model has been estimated correctly. 
However for a correct estimation of the parameters a 
large amount of training data is required. In the case 
where the training data of a model is provided by a cus- 
tomer, the amount of training data is a factor of incon- 
venience for the customer. Poor performance within a 
model will also lead to system errors, being another fac- 
tor of inconvenience for the customer. A problem that is 
solved by the present invention is to find model topolo- 
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gies having good performance with a minimum of train- 
ing data. 

The solution ol the problem proposed herewith of 
both maximizing the performance of the model and min- 
imizing the requirement ol training data is to use a model 
system having a dynamic topology. The model has a 
combination of model units or parts having varying de- 
gree of complexity. The effective topology of the model 
is changed, such that for a given amount of training data 
the optimum topology is used, based on the given model 
unit. By using this technique, the effective complexity of 
the model will grow with the available amount of training 
data. 

In the beginning of the service life of the model, the 
simplest model units are used, requiring only a small 
amount of data for a reliable estimation of its parame- 
ters. As the amount of available data grows, the more 
complex parts can be trained successively. 

Once the parameters of the more complex unit have 
been estimated in a reliable way the performance there- 
of is probably better than that of the simpler unit and the 
topology of the model may be changed in favour of the 
complex unit. 

In the single Figure a speaker verification system in 
accordance with the present invention is illustrated 
schematically. The system comprises a control unit con- 
trolling two switches and a number of model units P 1 - 
P n . On the one hand, the system receives speech or 
speech data and supplies verification data as the output 
signal. 

The various model units P-|-P n of the speaker model 
have different requirements of training data. A model 
unit Pj should only be used for verification when it has 
received sufficient training data. The units requiring a 
smaller amount of data will be put into operation earlier, 
while the more demanding units will not be used until a 
longer training period has elapsed, in this way, the per- 
formance of the speaker model may be enhanced to- 
wards its full capacity. During the growth period the mod- 
el may still be used for verification by using the simpler 
modei unite of the speaker model. 

The simpler parts may be taken out of service as 
the more complex units achieve better performance. 

The shift to newer models may be effected over sev- 
eral generations, so thai more and more advanced mod- 
els requiring more speech data continuously is put into 
operation. In this way, the speaker verification system 
may be upgraded without being put out of operation. In 
addition, it is contemplated that each model consists of 
several submodels weighted together in various ways 
to define a model. 

When the speaker verification system is put into op- 
eration the very first time it requires a short training pe- 
riod to train the simplest model unit. The simplest model 
unit can be trained from a speaker independent templet. 
Thereafter, the system is put into operation with increas- 
ing performance in accordance with what is stated 
above. 



Each unit of the speaker model hierarchy will need 
to store information relating to how well trained it is. This 
information may be provided either by the model unit 
itself or by some performance testing method. In the 

5 former case, the information is called training level 
while, in the latter case, the information is called per- 
formance level. The training level is based on an as- 
sumed a priori knowledge about how much training data 
is needed by the unit. The difference between the two 

to kinds of information is that the performance level is 
based on some evaluation of test data (a data base run), 
while the training level is based on stored information 
about used training data. The performance level may be 
based on comparisons with other units of the speaker 

is model and even other speaker models. 

Thresholds for the training level and the perform- 
ance level must be provided and stored in the control 
unit. In the former case, the threshold is based on pre- 
viously made assumptions. For the latter, it should be 

20 possible to base the value of the threshold on a criterion 
of the performance demand. 

In order to enable use of a performance level based 
on data base simulation, it is necessary to include man- 
agement of such a data base. The speaker model 

25 should also be able to state a value of its total training 
level or performance level. This value may be used by 
an application to estimate the significance level of a de- 
cision taken by the verification system. 

The performance of the model units is tested at suit- 

30 able intervals in order to check if they should be opera- 
tive or not. This may be effected cyclically or on a special 
command. 

The invention has been described with reference to 
a speaker verification system but, as is mentioned 
35 above, the invention may equally be applied in other 
syslems using models based on data collected from a 
person under a certain time period, e.g. speaker adap- 
tive speech recognition systems. The invention is only 
limited by the claims below. 

40 

Claims 

1. Method for adapting a model in e.g. speaker verifi- 
es cation systems, comprising model units for receiv- 
ing and evaluating speech, characterized by col- 
lecting speech data and putting a first model unit 
(P.,) into operation, training a subsequent model 
unit (P n ) with speech data collected during the op- 

$o eration of the first model unit, testing and evaluating 
the performance capacities of the model units and 
putting the subsequent model unit into operation 
once the performance capacity thereof has reached 
a suitable level. 

55 

2, Method in accordance with claim 1, characterized 
by putting the subsequent model unit into operation 
instantaneously once the performance capacity 
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thereof exceeds a predetermined threshold. 11. Arrangement in accordance 



3. Method in accordance with claim 1, characterized 
by putting the subsequent model unit into operation 
gradually, once the performance capacity thereof 
exceeds a respective threshold of a plurality of pre- 
determined thresholds. 

4. Method in accordance with claim 3, characterized 
by putting the subsequent model unit into operation 
gradually by weighting the various model units with 
a variable weight function. 

5. Method in accordance with anyone of the previous 
claims, characterized by connecting a new model 
unit as a subsequent model. 

6. Method in accordance with anyone of the previous 
claims, characterized by training all model units with 
collected speech data. 

7. Method in accordance with anyone of claims 1 to 6, 
characterized by training all model units except the 
respective operative model units with collected 
speech data. 

8. Arrangemang tor adapting a model in e.g. speeker 
verification systems, comprising model units for re- 
ceiving and evaluating speech and a control unit, 
characterized by a first switch for directing speech 
data to the various model units (P-i-P,,), a second 
switch for directing verification data from the various 
model units (P^P^, said switches being controlled 
by the control unit such that the model units collect 
speech data and that a first model unit (P t ) is put 
into operation, a subsequent model unit (P n ) is 
trained with speech data collected during the oper- 
ation of the first model unit, that the performance 
capacity of the model units are tested and evaluated 
and that the subsequent model unit is put into op- 
oration once the performance capacity thereof has 
reached a suitable level. 

9. Arrangement in accordance with claim 8, character- 
ized in that a predetermined threshold is stored in 
the control unit in orderto put the subsequent model 
unit into operation instantaneously, once the per- 
formance capacity thereof exceeds the predeter- 
mined threshold. 

1 0. Arrangemant in accordance with claim 6, character- 
ized in that a plurality of predetermined thresholds 
are stored in the control unit in order to put the sub- 
sequent model unit into operation gradually, once 
the performance capacity thereof exceeds a re- 
spective threshold of the predetermined plurality of 
thresholds. 



with claim 10, charac- 
terized in that the control unit comprises a variable 
weight function to put the subsequent model unit in- 
to operation gradually by weighting the different 
s models with the weight function. 

12. Arrangement in accordance with anyone of claims 
8 to 11 , characterized in that a model unit consists 
of submodels. 
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